Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model
نویسندگان
چکیده
This paper proposes a new unsupervised method for decomposing a multi-author document into authorial components. We assume that we do not know anything about the document and the authors, except the number of the authors of that document. The key idea is to exploit the difference in the posterior probability of the Naive-Bayesian model to increase the precision of the clustering assignment and the accuracy of the classification process of our method. Experimental results show that the proposed method outperforms two state-of-the-art methods.
منابع مشابه
A Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملAn improved algorithm for unsupervised decomposition of a multi-author document
This paper addresses the problem of unsupervised decomposition of a multi author text document: identifying the sentences that were written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison be...
متن کاملUnsupervised Multi-Author Document Decomposition Based on Hidden Markov Model
This paper proposes an unsupervised approach for segmenting a multiauthor document into authorial components. The key novelty is that we utilize the sequential patterns hidden among document elements when determining their authorships. For this purpose, we adopt Hidden Markov Model (HMM) and construct a sequential probabilistic model to capture the dependencies of sequential sentences and their...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کامل